Image encoding method, image decoding method, image encoding apparatus, image decoding apparatus, image encoding program, and image decoding program

ABSTRACT

An image encoding/decoding apparatus for performing encoding/decoding while predicting an image between different views using a reference image for a view different from a processing target image and a reference depth map which is a depth map for an object of the reference image when a multi-view image including images of a plurality of different views is encoded/decoded includes a reference depth region setting unit configured to set a reference depth region which is a corresponding region on the reference depth map for decoding target regions into which the processing target image is divided, and an inter-view prediction unit configured to generate an inter-view predicted image for the processing target region from the reference image using depth information in the reference depth region as depth information for the processing target region.

CROSS-REFERENCE TO RELATED APPLICATIONS

This application is a continuation of U.S. application Ser. No.14/652,673 filed Jun. 16, 2015 which is a 371 U.S. National Stage ofInternational Application No. PCT/JP2013/084376, filed on Dec. 20, 2013,which claims the benefit of and priority to Japanese Patent ApplicationNo. 2012-284616, filed on Dec. 27, 2012. The disclosures of the aboveapplications are incorporated herein by reference.

TECHNICAL FIELD

The present invention relates to an image encoding method, an imagedecoding method, an image encoding apparatus, an image decodingapparatus, an image encoding program, and an image decoding program forencoding and decoding a multi-view image.

BACKGROUND ART

Conventionally, multi-view images each including a plurality of imagesobtained by photographing the same object and background using aplurality of cameras are known. A moving image captured by the pluralityof cameras is referred to as a multi-view moving image (multi-viewvideo). In the following description, an image (moving image) capturedby one camera is referred to as a “two-dimensional image (movingimage),” and a group of two-dimensional images (two-dimensional movingimages) obtained by photographing the same object and background using aplurality of cameras differing in a position and/or direction(hereinafter referred to as a view) is referred to as a “multi-viewimage (multi-view moving image).”

A two-dimensional moving image has a high correlation in relation to atime direction and coding efficiency can be improved using thecorrelation. On the other hand, when cameras are synchronized, frames(images) corresponding to the same time of videos of the cameras in amulti-view image or a multi-view moving image are frames (images)obtained by photographing the object and background in completely thesame state from different positions, and thus there is a highcorrelation between the cameras (between different two-dimensionalimages of the same time). It is possible to improve coding efficiency byusing the correlation in coding of a multi-view image or a multi-viewmoving image.

Here, conventional technology relating to encoding technology oftwo-dimensional moving images will be described. In many conventionaltwo-dimensional moving-image encoding schemes including H.264, MPEG)-2,and MPEG-4, which are international coding standards, highly efficientencoding is performed using technologies of motion-compensatedprediction, orthogonal transform, quantization, and entropy encoding.For example, in H.264, encoding using a temporal correlation with aplurality of past or future frames is possible.

Details of the motion-compensated prediction technology used in H.264,for example, are disclosed in Non-Patent Document 1. An outline of themotion-compensated prediction technology used in H.264 will bedescribed. The motion-compensated prediction of H.264 enables anencoding target frame to be divided into blocks of various sizes andenables the blocks to have different motion vectors and differentreference images. Using a different motion vector in each block, highlyprecise prediction which compensates for a different motion of adifferent object is realized. On the other hand, prediction having highprecision considering occlusion caused by a temporal change is realizedusing a different reference frame in each block.

Next, a conventional encoding scheme for multi-view images or multi-viewmoving images will be described. A difference between the multi-viewimage coding scheme and the multi-view moving-image coding scheme isthat a correlation in the time direction is simultaneously present in amulti-view moving image in addition to the correlation between thecameras. However, the same method using the correlation between thecameras can be used in both cases. Therefore, a method to be used incoding multi-view moving images will be described here.

In order to use the correlation between the cameras in coding ofmulti-view moving images, there is a conventional scheme of encoding amulti-view moving image with high efficiency through“disparity-compensated prediction” in which the motion-compensatedprediction is applied to images captured by different cameras at thesame time. Here, the disparity is a difference between positions atwhich the same portion on an object is present on image planes ofcameras arranged at different positions. FIG. 15 is a conceptual diagramillustrating the disparity occurring between the cameras. In theconceptual diagram illustrated in FIG. 15, image planes of camerashaving parallel optical axes face down vertically. In this manner, thepositions at which the same portion on the object are projected on theimage planes of the different cameras are generally referred to as acorresponding point.

In the disparity-compensated prediction, each pixel value of an encodingtarget frame is predicted from a reference frame based on thecorresponding relationship, and a prediction residual thereof anddisparity information representing the corresponding relationship areencoded. Because the disparity varies for every pair of target camerasand positions of the target cameras, it is necessary to encode disparityinformation for each region in which the disparity-compensatedprediction is performed. Actually, in the multi-view moving-image codingscheme of H.264, a vector representing the disparity information isencoded for each block using the disparity-compensated prediction.

The corresponding relationship provided by the disparity information canbe represented as a one-dimensional amount representing athree-dimensional position of an object, rather than a two-dimensionalvector, based on epipolar geometric constraints by using cameraparameters. Although there are various representations as informationrepresenting a three-dimensional position of the object, the distancefrom a reference camera to the object or a coordinate value on an axiswhich is not parallel to an image plane of the camera is normally used.The reciprocal of the distance may be used instead of the distance. Inaddition, because the reciprocal of the distance is informationproportional to the disparity, two reference cameras may be set and athree-dimensional position may be represented as the amount of disparitybetween images captured by the cameras. Because there is no essentialdifference regardless of what expression is used, informationrepresenting three-dimensional positions is hereinafter expressed as adepth without such expressions being distinguished.

FIG. 16 is a conceptual diagram of epipolar geometric constraints.According to the epipolar geometric constraints, a point on an image ofanother camera corresponding to a point on an image of a certain camerais constrained to a straight line called an epipolar line. At this time,when a depth for a pixel of the image is obtained, a corresponding pointis uniquely defined on the epipolar line. For example, as illustrated inFIG. 16, a corresponding point in an image of a second camera for theobject projected at a position m in an image of a first camera isprojected at a position m′ on the epipolar line when the position of theobject in a real space is M′ and projected at a position m″ on theepipolar line when the position of the object in the real space is M″.

In Non-Patent Document 2, a highly precise predicted image is generatedand efficient multi-view moving-image coding is realized by using thisproperty and synthesizing a predicted image for an encoding target framefrom a reference frame in accordance with three-dimensional informationof each object given by a depth map (distance image) for the referenceframe. Also, the predicted image generated based on the depth isreferred to as a view-synthesized image, a view-interpolated image, or adisparity-compensated image.

Further, in Patent Document 1, it is possible to generate aview-synthesized image only for a necessary region by initiallyconverting a depth map for a reference frame into a depth map for anencoding target frame and obtaining a corresponding point using theconverted depth map. Thereby, when the image or moving image is encodedor decoded while a method of generating the predicted image is switchedfor every region of a frame serving as an encoding or decoding target, aprocessing amount for generating the view-synthesized image or a memoryamount for temporarily accumulating the view-synthesized image isreduced.

PRIOR ART DOCUMENTS Patent Document

-   Patent Document 1: Japanese Unexamined Patent Application, First    Publication No. 2010-21844

Non-Patent Document

-   Non-Patent Document 1: ITU-T Recommendation H.264 (March 2009),    “Advanced Video Coding for Generic Audiovisual Services,” March    2009.-   Non-Patent Document 2: Shinya SHIMIZU, Masaki KITAHARA, Kazuto    KAMIKURA and Yoshiyuki YASHIMA “Multi-view Video Coding based on 3-D    Warping with Depth Map,” In Proceedings of Picture Coding Symposium    2006, SS3-6, April 2006.

SUMMARY OF INVENTION Problems to be Solved by the Invention

According to a method disclosed in Patent Document 1, it is possible toobtain a corresponding pixel on a reference frame from pixels of anencoding target frame because a depth is obtained for the encodingtarget frame. Thereby, when the view-synthesized image is necessary inonly a partial region of the encoding target frame, because theview-synthesized image for only a designated region of the encodingtarget frame is generated, it is possible to reduce the processingamount or the required memory amount compared to the case in which theview-synthesized image of one frame is constantly generated.

However, because it is necessary to synthesize a depth map for anencoding target frame from a depth map for a reference frame when theview-synthesized image for the entire encoding target frame isnecessary, there is a problem in that the processing amount increasesmore than when the view-synthesized image is directly generated from thedepth map for the reference frame.

The present invention has been made in view of such circumstances, andan objective of the invention is to provide an image encoding method, animage decoding method, an image encoding apparatus, an image decodingapparatus, an image encoding program, and an image decoding program thatenable a view-synthesized image to be generated through a smallcalculation without significantly degrading the quality of theview-synthesized image when the view-synthesized image of a processingtarget frame is generated.

Means for Solving the Problems

According to the present invention, there is provided an image decodingapparatus which performs decoding while predicting an image betweendifferent views using a reference image decoded for a view differentfrom a decoding target image and a reference depth map which is a depthmap for an object of the reference image when the decoding target imageis decoded from encoded data of a multi-view image including images of aplurality of different views, the image decoding apparatus including: areference depth region setting unit configured to set a reference depthregion which is a corresponding region on the reference depth map fordecoding target regions into which the decoding target image is divided;and an inter-view prediction unit configured to generate an inter-viewpredicted image for the decoding target region from the reference imageusing depth information in the reference depth region as depthinformation for the decoding target region.

The image decoding apparatus of the present invention may furtherinclude: a depth reference disparity vector setting unit configured toset a depth reference disparity vector which is a disparity vector for areference depth map with respect to the decoding target region, whereinthe reference depth setting unit sets a region indicated by the depthreference disparity vector as the reference depth region.

In the image decoding apparatus of the present invention, the depthreference disparity vector setting unit may set the depth referencedisparity vector using a disparity vector used when a region adjacent tothe decoding target region is decoded.

In the image decoding apparatus of the present invention, the depthreference disparity vector setting unit may set the depth referencedisparity vector using depth information for a region on the referencedepth map having the same position as the decoding target region.

In the image decoding apparatus of the present invention, the inter-viewprediction unit may set a representative depth using depth informationwithin the corresponding reference depth region for every predictedregion obtained by dividing the decoding target region and generate aninter-view predicted image for the decoding target region by generatinga view-synthesized image from the representative depth and the referenceimage.

The image decoding apparatus of the present invention, the inter-viewprediction unit may set an image reference disparity vector which is adisparity vector for the reference image using depth information withinthe corresponding reference depth region for every predicted regionobtained by dividing the decoding target region and generate aninter-view predicted image for the decoding target region by generatinga disparity-compensated image using the image reference disparity vectorand the reference image.

The image decoding apparatus of the present invention may furtherinclude: an image reference disparity vector accumulation unitconfigured to accumulate the image reference disparity vector; and adisparity prediction unit configured to generate predicted disparityinformation for a region adjacent to the decoding target region usingthe accumulated image reference disparity vector.

In the image decoding apparatus of the present invention, the disparityprediction unit may generate a depth reference disparity vector for aregion adjacent to the decoding target region.

The image decoding apparatus of the present invention may furtherinclude: a correction disparity vector unit configured to set acorrection disparity vector which is a vector for correcting the imagereference disparity vector, wherein the inter-view prediction unit maygenerate the inter-view predicted image by generating adisparity-compensated image using a vector obtained by correcting theimage reference disparity vector through the correction disparity vectorand the reference image.

In the image decoding apparatus of the present invention, the correctiondisparity vector setting unit may set one vector as the correctiondisparity vector for the decoding target region.

The image decoding apparatus of the present invention may furtherinclude: a predicted region division setting unit configured to setregion divisions within the decoding target region based on depthinformation within the reference depth region, wherein the inter-viewprediction unit may designate a region obtained according to the regiondivision as the predicted region.

According to the present invention, there is provided an image decodingmethod which performs decoding while predicting an image betweendifferent views using a reference image decoded for a view differentfrom a decoding target image and a reference depth map which is a depthmap for an object of the reference image when the decoding target imageis decoded from encoded data of a multi-view image including images of aplurality of different views, the image decoding method including: areference depth region setting step of setting a reference depth regionwhich is a corresponding region on the reference depth map for decodingtarget regions into which the decoding target image is divided; and aninter-view prediction step of generating an inter-view predicted imagefor the decoding target region from the reference image using depthinformation in the reference depth region as depth information for thedecoding target region.

According to the present invention, there is provided an image encodingapparatus which performs encoding while predicting an image betweendifferent views using a reference image encoded for a different viewfrom an encoding target image and a reference depth map which is a depthmap for an object of the reference image when a multi-view imageincluding images of a plurality of different views is encoded, the imageencoding apparatus including: a reference depth region setting unitconfigured to set a reference depth region which is a correspondingregion on the reference depth map for encoding target regions into whichthe encoding target image is divided; and an inter-view prediction unitconfigured to generate an inter-view predicted image for the encodingtarget region from the reference image using depth information in thereference depth region as depth information for the encoding targetregion.

Further, according to the present invention, there is provided an imageencoding method which performs encoding while predicting an imagebetween different views using a reference image encoded for a differentview from an encoding target image and a reference depth map which is adepth map for an object of the reference image when a multi-view imageincluding images of a plurality of different views is encoded, the imageencoding method including: a reference depth region setting step ofsetting a reference depth region which is a corresponding region on thereference depth map for encoding target regions into which the encodingtarget image is divided; and an inter-view prediction step of generatingan inter-view predicted image for the encoding target region from thereference image using depth information in the reference depth region asdepth information for the encoding target region.

The present invention includes an image encoding program for causing acomputer to execute the image encoding method.

The present invention includes an image decoding program for causing acomputer to execute the image decoding method.

Advantageous Effects of the Invention

According to the present invention, there is an advantageous effect inthat it is possible to omit a process of generating a depth map for aprocessing target frame and generates a view-synthesized image bydirectly referring to and employing a depth map for a frame other than aprocessing target frame when a view-synthesized image of a processingtarget frame is generated using a depth map for a frame other than aprocessing target frame, and generate a view-synthesized image in asmall calculation amount.

BRIEF DESCRIPTION OF DRAWINGS

FIG. 1 is a block diagram illustrating a configuration of an imageencoding apparatus in an embodiment of the present invention.

FIG. 2 is a flowchart illustrating an operation of an image encodingapparatus illustrated in FIG. 1.

FIG. 3 is a flowchart illustrating a detailed processing operation of aprocess (step S14) of generating a view-synthesized image for a blockblk illustrated in FIG. 2.

FIG. 4 is a block diagram illustrating a modified example of an imageencoding apparatus illustrated in FIG. 1.

FIG. 5 is a flowchart illustrating a modified example of an operation ofthe image encoding apparatus illustrated in FIG. 1.

FIG. 6 is a block diagram illustrating another modified example of theimage encoding apparatus illustrated in FIG. 1.

FIG. 7 is a block diagram illustrating a configuration of an imagedecoding apparatus in an embodiment of the present invention.

FIG. 8 is a flowchart illustrating an operation of an image decodingapparatus illustrated in FIG. 7.

FIG. 9 is a block diagram illustrating a modified example of the imagedecoding apparatus illustrated in FIG. 7.

FIG. 10 is a flowchart illustrating a modified example of an operationof the image decoding apparatus illustrated in FIG. 7.

FIG. 11 is a flowchart illustrating another modified example of theoperation of the image decoding apparatus illustrated in FIG. 7.

FIG. 12 is a block diagram illustrating another modified example of theimage decoding apparatus illustrated in FIG. 7.

FIG. 13 is a block diagram illustrating a hardware configuration whenthe image encoding apparatus is constituted of a computer and a softwareprogram.

FIG. 14 is a block diagram illustrating a hardware configuration examplewhen the image decoding apparatus is constituted of a computer and asoftware program.

FIG. 15 is a conceptual diagram of disparity which occurs betweencameras.

FIG. 16 is a conceptual diagram of epipolar geometric constraints.

EMBODIMENTS FOR CARRYING OUT THE INVENTION

Hereinafter, an image encoding apparatus and an image decoding apparatusaccording to embodiments of the present invention will be described withreference to the drawings. In the following description, the case inwhich a multi-view image captured by a first camera (referred to as acamera A) and a second camera (referred to as a camera B) is encoded isassumed and an image of the camera B is described as being encoded ordecoded by designating an image of the camera A as a reference image.

Also, information necessary for obtaining a disparity from depthinformation is assumed to be separately assigned. Specifically, althoughthis information is an external parameter representing a positionalrelationship of the cameras A and B or an internal parameterrepresenting projection information for an image plane by the camera,other information may be assigned when a disparity is obtained from thedepth information even in other forms. Detailed description relating tothese camera parameters, for example, is disclosed in Reference Document<Olivier Faugeras, “Three-Dimensional Computer Vision,” MIT Press;BCTC/UFF-006.37 F259 1993, ISBN: 0-262-06158-9>. In this document,description relating to a parameter representing a positionalrelationship of a plurality of cameras or a parameter representingprojection information for an image plane by a camera is disclosed.

In the following description, information (a coordinate value or anindex capable of corresponding to the coordinate value) capable ofspecifying a position added between symbols [ ] to an image or videoframe or a depth map is assumed to represent an image signal sampled bya pixel of the same position or a depth corresponding to the imagesignal. In addition, a coordinate value or a block of a positionobtained by shifting coordinates or a block by an amount of a vector isassumed to be represented by the addition of an index value capable ofcorresponding to a coordinate value or a block to a vector.

FIG. 1 is a block diagram illustrating a configuration of an imageencoding apparatus in this embodiment. As illustrated in FIG. 1, theimage encoding apparatus 100 includes an encoding target image inputunit 101, an encoding target image memory 102, a reference image inputunit 103, a reference image memory 104, a reference depth map input unit105, a reference depth map memory 106, a disparity vector setting unit107, a view-synthesized image generating unit 108, and an image encodingunit 109.

The encoding target image input unit 101 inputs an image serving as anencoding target. Hereinafter, the image serving as the encoding targetis referred to as an encoding target image. Here, the image of thecamera B is assumed to be input. In addition, a camera (here, the cameraB) capturing the encoding target image is referred to as an encodingtarget camera.

The encoding target image memory 102 stores the input encoding targetimage. The reference image input unit 103 inputs an image to be referredto when the view-synthesized image (disparity-compensated image) isgenerated. Hereinafter, the image input here is referred to as areference image. Here, an image of the camera A is assumed to be input.

The reference image memory 104 stores the input reference image. Here,the camera (here, the camera A) capturing the reference image isreferred to as a reference camera.

The reference depth map input unit 105 inputs a depth map to be referredto when a view-synthesized image is generated. Here, although the depthmap for the reference image is assumed to be input, the depth map foranother camera may also be input. Hereinafter, this depth map isreferred to as a reference depth map.

A three-dimensional position of the object shown in each pixel of theimage corresponding to the depth map is indicated. As long as thethree-dimensional position is obtained by information of a separatelyassigned camera parameter or the like, any information may be used. Forexample, it is possible to use a distance from the camera to the objector a coordinate value for an axis which is not parallel to an imageplane and a disparity amount for another camera (for example, the cameraB). In addition, because a disparity amount may be obtained here, adisparity map directly representing the disparity amount rather than adepth map may be used. In addition, although the depth map is given inthe form of an image here, the depth map may not be configured in theform of an image as long as similar information can be obtained. Thereference depth map memory 106 stores the input reference depth map.Hereinafter, a camera (here, the camera A) corresponding to thereference depth map is referred to as a reference camera.

The disparity vector setting unit 107 sets a disparity vector for areference depth map for every encoding target frame or every bockobtained by dividing the encoding target frame. The view-synthesizedimage generating unit 108 (inter-view prediction unit) obtains acorresponding relationship between a pixel of an encoding target imageand a pixel of a reference image using the reference depth map andgenerates a view-synthesized image for the encoding target image. Theimage encoding unit 109 outputs a bitstream which is encoded dataobtained by performing predictive encoding on the encoding target imageusing the view-synthesized image.

Next, an operation of the image encoding apparatus 100 illustrated inFIG. 1 will be described with reference to FIG. 2. FIG. 2 is a flowchartillustrating the operation of the image encoding apparatus 100illustrated in FIG. 1. The encoding target image input unit 101 inputsan encoding target image and stores the input encoding target image inthe encoding target image memory 102 (step S11). Next, the referenceimage input unit 103 inputs a reference image and stores the inputreference image in the reference image memory 104. In parallel withthis, the reference depth map input unit 105 inputs a reference depthmap and stores the input reference depth map in the reference depth mapmemory 106 (step S12).

Also, the reference image and the reference depth map input in step S12are assumed to be the same as those to be obtained by the decoding sidesuch as the reference image and the reference depth map obtained bydecoding the already encoded reference image and reference depth map.This is because the occurrence of encoding noise such as a drift issuppressed by using exactly the same information as that obtained by thedecoding apparatus. However, when this occurrence of encoding noise isallowed, content obtained by only the encoding side such as contentbefore encoding may be input. In relation to the reference depth map, adepth map estimated by applying stereo matching or the like to amulti-view image decoded for a plurality of cameras, a depth mapestimated using a decoded disparity vector or motion vector or the like,or the like may be used as a depth map to be equally obtained by thedecoding side in addition to content obtained by decoding alreadyencoded content.

Next, the image encoding apparatus 100 encodes the encoding target imagewhile creating a view-synthesized image for every block obtained bydividing the encoding target image. That is, after a variable blkindicating an index of a block of an encoding target image isinitialized to 0 (step S13), the following process (steps S14 and S15)is iterated until blk reaches numBlks (step S17) while blk isincremented by 1 (step S16). Also, numBlks indicates the number of unitblocks on which an encoding process is performed in the encoding targetimage.

In the process to be performed for every block of the encoding targetimage, first, a view-synthesized image for the block blk is generated inthe disparity vector setting unit 107 and the view-synthesized imagegenerating unit 108 (step S14). Here, the process will be described indetail below.

Next, after the view-synthesized image is obtained, the image encodingunit 109 performs predictive encoding on an encoding target image tooutput a predictive encoding result using the view-synthesized image asa predicted image (step S15). A bitstream obtained as a result ofencoding becomes an output of the image encoding apparatus 100. Also, aslong as decoding is able to be correctly performed in the decoding side,any method may be used in encoding.

In general moving-image encoding or image encoding such as MPEG-2,H.264, or joint photographic experts group (JPEG), encoding is performedby generating a difference signal between an encoding target image and apredicted image for every block, performing frequency conversion such asa discrete cosine transform (DCT) on a difference image, andsequentially applying processes of quantization, binarization, andentropy encoding on a value obtained as a result.

Although a view-synthesized image is used as a predicted image in allblocks in this embodiment, an image generated by a different method forevery block may be used as a predicted image. In this case, it isnecessary to discriminate a method in which a generated image is used asa predicted image in the decoding side. For example, a configuration maybe made so that the method can be discriminated in the decoding side byencoding information indicating a method (mode or vector information orthe like) of generating the predicted image and including the encodedinformation in a bitstream as in H.264.

Next, processing operations of the disparity vector setting unit 107 andthe view-synthesized image generating unit 108 illustrated in FIG. 1will be described with reference to FIG. 3. FIG. 3 is a flowchartillustrating a detailed processing operation of a process (step S14) ofgenerating a view-synthesized image for a block blk (encoding targetregion) obtained by dividing an encoding target image illustrated inFIG. 2. First, the disparity vector setting unit 107 (reference depthregion setting unit) sets a disparity vector dv (depth referencedisparity vector) for indicating a block (reference depth region) on acorresponding reference depth map for the block blk (step S1401, areference depth region setting step, and a depth reference disparityvector setting step). Although the disparity vector may be set using anymethod, it is necessary to obtain the same disparity vector in thedecoding side.

The disparity vector dv, for example, may be obtained from a depth valueof a reference depth map having the same position as the block blk.Specifically, a maximum value, a minimum value, a median value, anaverage value, or the like among depth values present within a block ofthe reference depth map having the same position as the block blk may beused for the disparity vector dv. In addition, the disparity vector maybe obtained using only depth values for specific pixels such as pixelslocated at the center and the four apexes rather than depth values forall pixels within a block on a reference depth map having the sameposition as the block blk.

In addition, as another method, an arbitrary vector may be set as adisparity vector by performing a search on the reference depth map andthe decoding side may be notified of the encoding of the set disparityvector. In this case, as illustrated in FIG. 4, it is only necessary forthe image encoding apparatus 100 to further include a disparity vectorencoding unit 110 and a multiplexing unit 111. FIG. 4 is a block diagramillustrating a modified example of the image encoding apparatus 100illustrated in FIG. 1. The disparity vector encoding unit 110 encodes adisparity vector set by the disparity vector setting unit 107, and themultiplexing unit 111 multiplexes a bitstream of the disparity vectorand a bitstream of an encoding target image and outputs a multiplexingresult.

Also, a global disparity vector may be set for every large unit such asa frame or a slice without setting and encoding the disparity vector forevery block and the set global disparity vector may be used as the samedisparity vector in blocks within the frame or slice. In this case, asillustrated in FIG. 5, it is only necessary to set a disparity vectorfor a reference depth map (step S18) before a process to be performedfor every block (before step S13) and skip step S1401 illustrated inFIG. 3. FIG. 5 is a flowchart illustrating a modified example of theoperation illustrated in FIG. 2.

The global disparity vector may be set using various methods. Forexample, the vector may be obtained by regarding an overall region forsetting a global disparity vector as one block and performing blockmatching. In addition, one global disparity vector may be obtained bydividing the overall region for setting the global disparity vector intoa plurality of blocks and selecting a most likely vector from aplurality of vectors obtained by performing block matching for everyblock. In addition, one depth value may be obtained by analyzing a depthvalue for a region on a reference depth map having the same position asthe set region and a disparity vector corresponding to the depth valuemay be specified as a global disparity vector.

As still another method, the disparity vector for the block blk may beset from vector information encoded in an encoded block before the blockblk is encoded. Specifically, when disparity-compensated prediction isused when a block or a frame spatially or temporally adjacent to theblock blk is encoded, some disparity vectors are encoded in the block.Accordingly, the disparity vector in the block blk may be obtained fromthe disparity vectors according to a predetermined method.

As the predetermined method, there is a method of performing medianprediction from a disparity vector in an adjacent block or a methodusing a disparity vector in a specific block without change. In thiscase, as illustrated in FIG. 6, it is only necessary for the imageencoding apparatus 100 to further include a vector information memory112. FIG. 6 is a block diagram illustrating a modified example of theimage encoding apparatus 100 illustrated in FIG. 1. The vectorinformation memory 112 accumulates vector information used when theimage encoding unit 109 generates a predicted image. The accumulatedvector information is used when the disparity vector setting unit 107sets a disparity vector for another block blk.

In addition, this method may be combined with a method of setting anarbitrary vector as a disparity vector by encoding the above-describeddisparity vector. For example, a difference vector between the setarbitrary vector and an estimated vector from vector information encodedin an encoded block before the block blk is encoded may be generated andthe difference vector may be encoded.

Returning to FIG. 3, the view-synthesized image is generated for everysub-block obtained by dividing the block blk subsequently after thedisparity vector for the block blk is set. That is, after a variablesblk indicating an index of the sub-block is initialized to 0 (stepS1402), the following process (steps S1403 to S1405) is iterated untilsblk reaches numSBlks (step S1407) while sblk is incremented by 1 (stepS1406).

Here, numSBlks indicates the number of sub-blocks within the block blk.

Also, although it is possible to use various sizes or shapes of asub-block, the same sub-block division is required to be obtained in thedecoding side. For the size of the sub-block, for example, apredetermined division such as length×width, 2 pixels×2 pixels, 4pixels×4 pixels, or 8 pixels×8 pixels may be used. Also, 1 pixel×1 pixel(that is, every pixel) or the same size (that is, there is no division)as that of the block blk may be used as the predetermined division.

As another method using the same sub-block division as that of thedecoding side, a sub-block division method may be encoded and anotification of the method may be provided to the decoding side. In thiscase, the bitstream for the sub-block division method is multiplexedwith a bitstream of an encoding target image and a multiplexed bitstreambecomes part of a bitstream to be output by the image encoding apparatus100. Also, when the sub-block division method is selected, it ispossible to generate a high-quality predicted image in a smallprocessing amount according to a process of generating aview-synthesized image to be described below by selecting a method inwhich pixels included in one sub-block have the same disparity as muchas possible for the reference image and are divided into few sub-blocksas possible. Also, in this case, information indicating a sub-blockdivision is decoded from the bitstream in the decoding side and thesub-block division is performed according to a method based on thedecoded information.

As still another method, the sub-block division (a region divisionwithin an encoding target region) may be determined from a depth for ablock blk+dv on the reference depth map indicated by the disparityvector dv set in step 1401 (predicted region division setting step). Forexample, it is possible to obtain the sub-block division by clustering adepth of a block blk+dv of a reference depth map. In addition, aconfiguration may be made to select a division in which a depth is mostcorrectly classified from types of predetermined divisions withoutperforming clustering.

In a process to be performed for every sub-block, first, one depth valueis determined for the sub-block sblk using a reference depth map (stepS1403). Specifically, one depth value is determined from a depth for apixel within the block sblk+dv on the reference depth map indicated bythe disparity vector dv set in step S1401.

Various methods may be used in a method of determining one depth fromdepths for pixels within a block. However, it is necessary to use thesame method as that of the decoding side. For example, any one of anaverage value, a maximum value, a minimum value, and a median valueamong depth values for pixels within the block may be used. In addition,any one of an average value, a maximum value, a minimum value, and amedian value among depth values for pixels of four apexes of the blockmay be used. Further, a depth value in a specific position (top left,center or the like) of the block may be used.

Also, when a depth value for a certain position within the block is usedwhen the disparity vector dv is given in a fractional pixel, a depthvalue of the position is absent in a reference depth map. In this case,a depth value for a corresponding fractional pixel position may beobtained through interpolation and used and a depth value for theinteger pixel position may be used by performing a rounding operationfor an integer pixel position.

When the depth value is obtained for the sub-block sblk, then thedisparity vector sdv (image reference disparity vector) for a referenceimage is obtained from the depth value (step S1404). The conversion fromthe depth value into the disparity vector is performed according to thedefinition of a camera parameter or the like. Also, when a coordinatevalue for a sub-block is necessary, a pixel position of a specificposition such as the top left of the sub-block or a center position ofthe sub-block may be used. In addition, when the cameras areone-dimensionally disposed in parallel, it is possible to obtain adisparity vector from a depth value by referring to a lookup tablecreated in advance because a disparity direction depends upon the cameralayout and a disparity amount depends upon a depth value regardless of asub-block position.

Next, a disparity-compensated image (inter-view predicted image) for asub-block sblk is generated using an obtained disparity vector sdv andthe reference image (step S1405, inter-view prediction step). Here, theprocess can use a method similar to conventional disparity-compensatedprediction or motion-compensated prediction only using the given vectorand the reference image.

Also, a process implemented by steps S1404 and S1405 is an example of aprocess of generating a view-synthesized image when one depth value isgiven for a sub-block blk. Here, any method may be used as long as aview-synthesized image can be generated from one depth value given forthe sub-block. For example, a corresponding region (which is notrequired to have the same shape or size as the sub-block) on thereference image may be identified by assuming that the sub-block belongsto one depth plane and the view-synthesized image may be generated bywarping the reference image for the corresponding region.

In addition, because there is an error in modeling of a projection modelof a camera, parallelization (rectification) of a multi-view image, adepth, or the like, the error is included in a disparity vector obtainedbased on a camera parameter from the depth. In order to compensate forthis error, a correction vector cmv may be used on a reference image forthe disparity vector sdv. In this case, in step S1405, adisparity-compensated image is generated using the vector sdv+cmv as thedisparity vector. Also, although any vector may be specified as thecorrection vector, it is possible to minimize an error of thedisparity-compensated image and the encoding target image in theencoding target region or rate distortion cost in the encoding targetregion in setting of an efficient correction vector.

When the same correction vector as that of the decoding side isobtained, an arbitrary vector may be used.

For example, the arbitrary vector may be set and the decoding side maybe notified of the encoded vector by encoding the vector. When thevector is encoded and transmitted, it is possible to suppress a bitamount necessary for the encoding by setting one correction vector forevery block blk.

Also, when the correction vector is encoded, a vector is decoded at anappropriate timing (for every sub-block or every block) from thebitstream in the decoding side and the decoded vector is used as thecorrection vector.

When information related to a used inter-camera predicted image isaccumulated for every block or sub-block, information indicating that aview-synthesized image using the depth has been referred to may beaccumulated or information (image reference disparity vector) used whenthe inter-camera predicted image is actually generated may beaccumulated (image reference disparity vector accumulation step). Also,the accumulated information is referred to when another block or anotherframe is encoded or decoded. For example, when vector information (avector or the like to be used in disparity-compensated prediction) for acertain block is encoded or decoded, only a difference from thepredicted vector information may be encoded or decoded by generatingpredicted vector information from vector information accumulated in analready encoded block around the block. As another example, a disparityvector dv for a certain block may be set using vector informationaccumulated in an already encoded or decoded block around the block.

As information indicating that a view-synthesized image using a depthhas been referred to, corresponding prediction mode information may beaccumulated. Information corresponding to an inter-frame prediction modemay be accumulated as a prediction mode. At this time, the referenceframe information corresponding to the view-synthesized image may beaccumulated as a reference frame. In addition, as vector information,the disparity vector dv may be accumulated or the disparity vector dvand the correction vector cmv may be accumulated.

As information used when an inter-camera predicted image is actuallygenerated, the information corresponding to the inter-frame predictionmode may be accumulated as the prediction mode. At this time, thereference image may be accumulated as the reference frame. In addition,the disparity vector sdv for the reference image or the disparity vectorsdv+cmv for the corrected reference image may be accumulated for everysub-block as the vector information. Also, there is a case in which twoor more vectors are used within a sub-block such as a case in whichwarping or the like is used. In this case, all vectors may beaccumulated or one vector may be selected and accumulated for everysub-block in a predetermined method. As a method of selecting onevector, for example, there is a method in which a disparity amount isspecified as a maximum vector, a method of setting a vector in aspecific position (upper left or the like) of the sub-block, or thelike.

Next, an image decoding apparatus will be described. FIG. 7 is a blockdiagram illustrating a configuration of the image decoding apparatus inthis embodiment. The image decoding apparatus 200 includes a bitstreaminput unit 201, a bitstream memory 202, a reference image input unit203, a reference image memory 204, a reference depth map input unit 205,a reference depth map memory 206, a disparity vector setting unit 207, aview-synthesized image generating unit 208, and an image decoding unit209.

The bitstream input unit 201 inputs a bitstream of encoded data obtainedby encoding an image serving as a decoding target. Hereinafter, an imageserving as the decoding target is referred to as a decoding targetimage. Here, an image of the camera B is indicated. In addition,hereinafter, a camera (here, the camera B) capturing the decoding targetimage is referred to as a decoding target camera.

The bitstream memory 202 stores a bitstream for an input decoding targetimage. The reference image input unit 203 inputs an image to be referredto when the view-synthesized image (disparity-compensated image) isgenerated. Hereinafter, the image input here is referred to as areference image. Here, an image of the camera A is assumed to be input.The reference image memory 204 stores the input reference image.Hereinafter, the camera (here, the camera A) capturing the referenceimage is referred to as a reference camera.

The reference depth map input unit 205 inputs a depth map to be referredto when the view-synthesized image is generated. Here, although thedepth map for the reference image is assumed to be input, the depth mapmay be a depth map for another camera. Hereinafter, the depth map isreferred to as a reference depth map.

The depth map represents a three-dimensional position of the objectshown in each pixel of the corresponding image. As long as thethree-dimensional position is obtained through information such as aseparately given camera parameter, any information may be used. Forexample, it is possible to use a distance from the camera to the objector a coordinate value for an axis which is not parallel to an imageplane and a disparity amount for another camera (for example, the cameraB). In addition, because it is only necessary to obtain the disparityamount here, the disparity map directly representing the disparityamount rather than the depth map may be used.

Also, although the depth map is given in the form of an image here, thedepth map may not be configured in the form of an image as long assimilar information is obtained. The reference depth map memory 206stores the input reference depth map. Hereinafter, a camera (here, thecamera A) corresponding to the reference depth map is referred to as areference depth camera.

The disparity vector setting unit 207 sets the disparity vector for thereference depth map for every decoding target image or every blockobtained by dividing the decoding target image. The view-synthesizedimage generating unit 208 (inter-view prediction unit) obtains acorresponding relationship of a pixel of the decoding target image and apixel of the reference image using a reference depth map and generates aview-synthesized image for the decoding target image. The image decodingunit 209 outputs a decoded image by decoding the decoding target imagefrom the bitstream using the view-synthesized image.

Next, an operation of the image decoding apparatus 200 illustrated inFIG. 7 will be described with reference to FIG. 8. FIG. 8 is a flowchartillustrating an operation of the image decoding apparatus 200illustrated in FIG. 7.

The bitstream input unit 201 inputs a bitstream obtained by encoding adecoding target image and stores the input bitstream in the bitstreammemory 202 (step S21). In parallel with this, the reference image inputunit 203 inputs a reference image and stores the input reference imagein the reference image memory 204. In addition, the reference depth mapinput unit 205 inputs a reference depth map and stores the inputreference depth map in the reference depth map memory 206 (step S22).

The reference image and the reference depth map input in step S22 areassumed to be the same as those used in the encoding side. This isbecause the occurrence of encoding noise such as a drift is suppressedby using exactly the same information as that used by the image encodingapparatus 100. However, when this occurrence of encoding noise isallowed, content different from content used at the time of encoding maybe input. In relation to the reference depth map, a depth map estimatedby applying stereo matching or the like to a multi-view image decodedfor a plurality of cameras, a depth map estimated using a decodeddisparity vector or motion vector or the like, or the like may be usedin addition to separately decoded content.

Next, the image decoding apparatus 200 decodes a decoding target imagefrom a bitstream while creating a view-synthesized image for every blockobtained by dividing the decoding target image. That is, after avariable blk indicating an index of a block of a decoding target imageis initialized to 0 (step S23), the following process (steps S24 andS25) is iterated until blk reaches numBlks (step S27) while blk isincremented by 1 (step S26). Also, numBlks indicates the number of unitblocks on which a decoding process is performed in the decoding targetimage.

In the process to be performed for every block of the decoding targetimage, first, a view-synthesized image for the block blk is generated inthe disparity vector setting unit 207 (reference depth region settingunit) and the view-synthesized image generating unit 208 (inter-viewprediction unit) (step S24). Because the process here is the same asstep S14 illustrated in FIG. 2 described above (steps S1401 to 1407illustrated in FIG. 3), detailed description thereof is omitted. Asub-block division (a region division within the encoding target region)by the view-synthesized image generating unit 208 serving as a predictedregion division setting unit and a process to be performed for everysub-block are also similar.

Next, when the disparity-compensated image has been obtained, the imagedecoding unit 209 decodes the decoding target image from the bitstreamand outputs the decoded image while using the view-synthesized image asa predicted image (step S25). The decoded image obtained as a result ofdecoding becomes an output of the image decoding apparatus 200. Also, aslong as the bitstream can be correctly decoded, any method may be usedin decoding. In general, a method corresponding to that used at the timeof encoding is used.

When encoding is performed by general moving-image encoding or imageencoding such as MPEG-2, H.264, or JPEG, decoding is performed byperforming entropy decoding, inverse binarization, inverse quantization,and the like for every block, obtaining a predictive residual signal byperforming inverse frequency conversion such as an inverse discretecosine transform (IDCT), adding a predicted image to the predictiveresidual signal, and clipping the image in a pixel value range.

Although a view-synthesized image is used as a predicted image in allblocks in this embodiment, an image generated by a different method forevery block may be used as a predicted image. In this case, it isnecessary to discriminate a method in which a generated image is used asa predicted image and use an appropriate predicted image. For example,when information indicating a method (mode or vector information or thelike) of generating the predicted image is encoded and the encodedinformation is included in a bitstream as in H.264, an appropriatepredicted image may be selected and decoded by decoding the information.

Also, although a detailed process of step S24 illustrated in FIG. 8 isequivalent to a processing operation illustrated in FIG. 3, it isnecessary to perform the same process as that of the encoding side instep S1401 in which the disparity vector dv is set. As one method, thereis a case in which the disparity vector dv is multiplexed into abitstream. In this case, as illustrated in FIG. 9, it is only necessaryfor the image decoding apparatus 200 to further include a bitstreamseparating unit 210 and a disparity vector decoding unit 211 (depthreference disparity vector setting unit). FIG. 9 is a block diagramillustrating a modified example of the image decoding apparatus 200illustrated in FIG. 7.

The bitstream separating unit 210 separates the input bitstream into abitstream for the disparity vector dv and a bitstream for a decodingtarget image. In addition, the disparity vector decoding unit 211decodes the disparity vector dv from the separated bitstream. Thedecoded disparity vector is used in the view-synthesized imagegenerating unit 208. That is, after the disparity vector is decoded forevery block blk as illustrated in FIG. 10 (step S28), the generation ofthe view-synthesized image (step S24) and the decoding of the decodingtarget image (step S25) are performed. FIG. 10 is a flowchartillustrating a modified example of the operation illustrated in FIG. 8.

Also, a decoded global disparity vector may be used as the samedisparity vector in the block within the frame or the slice by decodingthe global disparity vector for every large unit such as a frame orslice without decoding a disparity vector for every block. In this case,as illustrated in FIG. 11, before a process to be performed for everyblock, a disparity vector for the reference depth map is decoded (stepS29). FIG. 11 is a flowchart illustrating the modified example of theoperation illustrated in FIG. 8.

As still another method, the disparity vector for the block blk may beset from vector information decoded in a decoded block before the blockblk is decoded.

Specifically, when disparity-compensated prediction is used when ablock, a frame, or the like spatially or temporally adjacent to theblock blk has been decoded, some disparity vectors are decoded in theblock. Accordingly, from these disparity vectors, the disparity vectorin the block blk may be obtained according to a predetermined method.

As the predetermined method, there is a method of performing medianprediction from a disparity vector in an adjacent block or a methodusing a disparity vector in a specific block without change. In thiscase, as illustrated in FIG. 12, it is only necessary for the imagedecoding apparatus 200 to further include a vector information memory212 (image reference disparity vector accumulation unit). FIG. 12 is ablock diagram illustrating a modified example of the image decodingapparatus 200 illustrated in FIG. 7. The vector information memory 212accumulates vector information used when the image decoding unit 209generates a predicted image. The accumulated vector information is usedwhen the disparity vector setting unit 207 sets a disparity vector foranother block blk.

In addition, this method may be combined with a method of setting anarbitrary vector as a disparity vector by decoding the above-describeddisparity vector. For example, a vector decoded from the bitstream maybe added to a vector estimated from the vector information decoded inthe block decoded before the block blk is decoded and the added vectormay be set as the disparity vector dv. Also, as described above, thedisparity vector dv may be obtained from the depth value of thereference depth map having the same position as the block blk.

Although a process of encoding and decoding all pixels of one frame hasbeen described in the above description, encoding or decoding may beperformed by applying a process of the embodiment of the presentinvention for only some pixels and employing intra-screen predictiveencoding, motion-compensated predictive encoding, or the like to be usedin H.264/AVC or the like for the other pixels. In this case, it isnecessary to encode and decode information representing a method usedfor prediction for every pixel. In addition, encoding or decoding may beperformed using a separate prediction scheme for every block rather thanevery pixel. Also, when prediction is performed using a view-synthesizedimage only for some pixels or blocks, it is possible to reduce acalculation amount in a view synthesizing process by performing aprocess (step S14 illustrated in FIG. 2 and step S24 and S28 illustratedin FIG. 8) of generating the view-synthesized image only for the pixels.

In addition, although a process of encoding and decoding one frame hasbeen described above, the present invention may also be applied tomoving-image encoding by iterating a plurality of frames. In addition,the present invention is applicable to some frames or some blocks ofmoving images. Further, although configurations and processingoperations of the image encoding apparatus and the image decodingapparatus have been described above, it is possible to implement theimage encoding method and the image decoding method of the presentinvention according to a processing operation corresponding to anoperation of each unit of the image encoding apparatus and the imagedecoding apparatus.

Further, although a reference depth map has been described as a depthmap for an image captured by a camera different from an encoding targetcamera or a decoding target camera above, it is possible to use a depthmap for an image captured by an encoding target camera or a decodingtarget camera at a different time from the encoding target image or thedecoding target image as a reference depth map. In this case, a motionvector rather than a disparity vector is set or decoded in steps S1401,S18, S28, and S29.

FIG. 13 is block diagram illustrating a hardware configuration when theabove-described image encoding apparatus 100 is constituted of acomputer and a software program.

The system illustrated in FIG. 13 has a configuration in which a centralprocessing unit (CPU) 50 configured to execute the program, a memory 51,an encoding target image input unit 52, a reference image input unit 53,a reference depth map input unit 54, a program storage apparatus 55, anda bitstream output unit 56 are connected through a bus.

The memory 51 such as a random access memory (RAM) stores the programand data to be accessed by the CPU 50. The encoding target image inputunit 52 inputs an image signal of an encoding target from a camera orthe like. The encoding target image input unit 52 may be a storage unitsuch as a disc apparatus configured to store an image signal. Thereference image input unit 53 inputs an image signal of a referencetarget from a camera or the like. This reference image input unit 53 maybe a storage unit such as a disc apparatus configured to store an imagesignal. The reference depth map input unit 54 inputs a depth map for acamera of a different position or direction from the camera capturingthe encoding target image from a depth camera or the like. The referencedepth map input unit 54 may be a storage unit such as a disc apparatusconfigured to store the depth map. The program storage apparatus 55stores an image encoding program 551 which is a software program forcausing the CPU 50 to execute the above-described image encodingprocess. The bitstream output unit 56 outputs a bitstream generated byexecuting the image encoding program 551 loaded by the CPU 50 to thememory 51, for example, via a network. The bitstream output unit 56 maybe a storage unit such as a disc apparatus configured to store thebitstream.

FIG. 14 is a block diagram illustrating a hardware configuration whenthe above-described image decoding apparatus 200 is constituted of acomputer and a software program.

The system illustrated in FIG. 14 has a configuration in which a CPU 60configured to execute the program, a memory 61, a bitstream input unit62, a reference image input unit 63, a reference depth map input unit64, a program storage apparatus 65, and a decoding target image outputunit 66 are connected through a bus.

The memory 61 such as a RAM stores the program and data to be accessedby the CPU 60. The bitstream input unit 62 inputs a bitstream encoded bythe image encoding apparatus according to this technique. This bitstreaminput unit 62 may be a storage unit such as a disc apparatus configuredto store an image signal. The reference image input unit 63 inputs animage signal of a reference target from a camera or the like. Thereference image input unit 63 may be a storage unit such as a discapparatus configured to store an image signal. The reference depth mapinput unit 64 inputs a depth map for a camera of a different position ordirection from the camera capturing the decoding target from a depthcamera or the like. The reference depth map input unit 64 may be astorage unit such as a disc apparatus configured to store the depthinformation. The program storage apparatus 65 stores an image decodingprogram 651 which is a software program for causing the CPU 60 toexecute the above-described image decoding process. The decoding targetimage output unit 66 outputs a decoding target image obtained bydecoding the bitstream to a reproduction apparatus or the like byexecuting the image decoding program 651 loaded to the memory 61 by theCPU 60. The decoding target image output unit 66 may be a storage unitsuch as a disc apparatus configured to store the image signal.

In addition, the image encoding process and the image decoding processmay be executed by recording a program for implementing functions of theprocessing units in the image encoding apparatus 100 illustrated in FIG.1 and the image decoding apparatus 200 illustrated in FIG. 7 on acomputer-readable recording medium and causing a computer system to readand execute the program recorded on the recording medium. Also, the“computer system” used here is assumed to include an operating system(OS) and hardware such as peripheral devices. In addition, the computersystem is assumed to include a homepage providing environment (ordisplaying environment) when a World Wide Web (WWW) system is used. Inaddition, the computer-readable recording medium refers to a storageapparatus including a flexible disk, a magneto-optical disc, a read onlymemory (ROM), or a portable medium such as a compact disc (CD)-ROM, anda hard disk embedded in the computer system. Furthermore, the“computer-readable recording medium” is assumed to be a medium thatholds a program for a constant period of time, such as a volatile memory(RAM) inside a computer system serving as a server or a client when theprogram is transmitted via a network such as the Internet or acommunication circuit such as a telephone circuit.

In addition, the above-described program may be transmitted from acomputer system storing the program in a storage apparatus or the likevia a transmission medium or transmitted to another computer system bytransmission waves in a transmission medium. Here, the “transmissionmedium” for transmitting the program refers to a medium having afunction of transmitting information, such as a network (communicationnetwork) like the Internet or a communication circuit (communicationline) like a telephone circuit. In addition, the above-described programmay be a program for implementing some of the above-described functions.Further, the above-described program may be a program, i.e., a so-calleddifferential file (differential program), capable of implementing theabove-described function in combination with a program already recordedon the computer system.

While the embodiments of the invention have been described above withreference to the drawings, it should be understood that these areexemplary of the invention and are not to be considered as limiting.Accordingly, additions, omissions, substitutions, and othermodifications of constituent elements may be made without departing fromthe spirit or scope of the present invention.

INDUSTRIAL APPLICABILITY

The present invention is applicable for essential use in achieving highencoding efficiency with a small calculation amount whendisparity-compensated prediction is performed on an encoding (decoding)target image using a depth map for an image captured from a positiondifferent from a camera capturing the encoding (decoding) target image.

DESCRIPTION OF REFERENCE SYMBOLS

-   -   101 Encoding target image input unit    -   102 Encoding target image memory    -   103 Reference image input unit    -   104 Reference image memory    -   105 Reference depth map input unit    -   106 Reference depth map memory    -   107 Disparity vector setting unit    -   108 View-synthesized image generating unit    -   109 Image encoding unit    -   110 Disparity vector encoding unit    -   111 Multiplexing unit    -   112 Vector information memory    -   201 Bitstream input unit    -   202 Bitstream memory    -   203 Reference image input unit    -   204 Reference image memory    -   205 Reference depth map input unit    -   206 Reference depth map memory    -   207 Disparity vector setting unit    -   208 View-synthesized image generating unit    -   209 Image decoding unit    -   210 Bitstream separating unit    -   211 Disparity vector decoding unit    -   212 Vector information memory

1. An image decoding apparatus which performs decoding while predictingan image between different views using a reference image decoded for aview different from a decoding target image and a reference depth mapwhich is a depth map for an object of the reference image when thedecoding target image is decoded from encoded data of a multi-view imageincluding images of a plurality of different views, the image decodingapparatus comprising: a reference depth region setting unit configuredto set a reference depth region which is a corresponding region on thereference depth map for decoding target regions into which the decodingtarget image is divided; an inter-view prediction unit configured to setan image reference disparity vector which is a disparity vector for thereference image using depth information in the reference depth regionand generate an inter-view predicted image for the decoding targetregion from the reference image and the image reference disparityvector; and an image decoding unit configured to decode the decodingtarget image from the inter-view predicted image and the encoded data,wherein a view of the reference depth map is different from a view ofthe decoding target image.
 2. The image decoding apparatus according toclaim 1, further comprising: a depth reference disparity vector settingunit configured to set a depth reference disparity vector which is adisparity vector for a reference depth map with respect to the decodingtarget region, wherein the reference depth region setting unit sets aregion indicated by the depth reference disparity vector as thereference depth region.
 3. The image decoding apparatus according toclaim 2, wherein the depth reference disparity vector setting unit setsthe depth reference disparity vector using a disparity vector used whena region adjacent to the decoding target region is decoded.
 4. The imagedecoding apparatus according to claim 2, wherein the depth referencedisparity vector setting unit sets the depth reference disparity vectorusing depth information for a region on the reference depth map havingthe same position as the decoding target region.
 5. The image decodingapparatus according to claim 1, wherein the inter-view prediction unitsets a representative depth using depth information within thecorresponding reference depth region for every predicted region obtainedby dividing the decoding target region and generates the inter-viewpredicted image for the decoding target region by generating aview-synthesized image from the representative depth and the referenceimage.
 6. The image decoding apparatus according to claim 1, wherein theinter-view prediction unit sets the image reference disparity vectorwhich is a disparity vector for the reference image using depthinformation within the corresponding reference depth region for everypredicted region obtained by dividing the decoding target region andgenerates the inter-view predicted image for the decoding target regionby generating a disparity-compensated image using the image referencedisparity vector and the reference image.
 7. The image decodingapparatus according to claim 6, further comprising: an image referencedisparity vector accumulation unit configured to accumulate the imagereference disparity vector; and a disparity prediction unit configuredto generate predicted disparity information for a region adjacent to thedecoding target region using the accumulated image reference disparityvector.
 8. The image decoding apparatus according to claim 7, whereinthe disparity prediction unit generates a depth reference disparityvector for a region adjacent to the decoding target region.
 9. The imagedecoding apparatus according to claim 6, further comprising: acorrection disparity vector setting unit configured to set a correctiondisparity vector which is a vector for correcting the image referencedisparity vector, wherein the inter-view prediction unit generates theinter-view predicted image by generating a disparity-compensated imageusing a vector obtained by correcting the image reference disparityvector through the correction disparity vector and the reference image.10. The image decoding apparatus according to claim 9, wherein thecorrection disparity vector setting unit sets one vector as thecorrection disparity vector for the decoding target region.
 11. Theimage decoding apparatus according to claim 5, further comprising: apredicted region division setting unit configured to set regiondivisions within the decoding target region based on depth informationwithin the reference depth region, wherein the inter-view predictionunit designates a region obtained according to the region division asthe predicted region.
 12. The image decoding apparatus according toclaim 6, further comprising: a predicted region division setting unitconfigured to set region divisions within the decoding target regionbased on depth information within the reference depth region, whereinthe inter-view prediction unit designates a region obtained according tothe region division as the predicted region.
 13. An image decodingmethod which performs decoding while predicting an image betweendifferent views using a reference image decoded for a view differentfrom a decoding target image and a reference depth map which is a depthmap for an object of the reference image when the decoding target imageis decoded from encoded data of a multi-view image including images of aplurality of different views, the image decoding method comprising: areference depth region setting step of setting a reference depth regionwhich is a corresponding region on the reference depth map for decodingtarget regions into which the decoding target image is divided; an imagereference disparity vector setting step of setting an image referencedisparity vector which is a disparity vector for the reference imageusing depth information in the reference depth region; an inter-viewprediction step of generating an inter-view predicted image for thedecoding target region from the reference image and the image referencedisparity vector; and an image decoding step of decoding the decodingtarget image from the inter-view predicted image and the encoded data,wherein a view of the reference depth map is different from a view ofthe decoding target image.
 14. An image encoding apparatus whichperforms encoding while predicting an image between different viewsusing a reference image encoded for a different view from an encodingtarget image and a reference depth map which is a depth map for anobject of the reference image when a multi-view image including imagesof a plurality of different views is encoded, the image encodingapparatus comprising: a reference depth region setting unit configuredto set a reference depth region which is a corresponding region on thereference depth map for encoding target regions into which the encodingtarget image is divided; an inter-view prediction unit configured to setan image reference disparity vector which is a disparity vector for thereference image using depth information in the reference depth regionand generate an inter-view predicted image for the encoding targetregion from the reference image and the image reference disparityvector; and an image encoding unit configured to encode the encodingtarget image using the inter-view predicted image, wherein a view of thereference depth map is different from a view of the encoding targetimage.
 15. An image encoding method which performs encoding whilepredicting an image between different views using a reference imageencoded for a different view from an encoding target image and areference depth map which is a depth map for an object of the referenceimage when a multi-view image including images of a plurality ofdifferent views is encoded, the image encoding method comprising: areference depth region setting step of setting a reference depth regionwhich is a corresponding region on the reference depth map for encodingtarget regions into which the encoding target image is divided; andisparity vector setting step of setting an image reference disparityvector which is a disparity vector for the reference image using depthinformation in the reference depth region; an inter-view prediction stepof generating an inter-view predicted image for the encoding targetregion from the reference image and the image reference; and an imageencoding step of encoding the encoding target image using the inter-viewpredicted image, wherein a view of the reference depth map is differentfrom a view of the encoding target image.
 16. A non-transitorycomputer-readable recoding medium having an image decoding program forcausing a computer to execute the image decoding method according toclaim
 13. 17. A non-transitory computer-readable recoding medium havingan image encoding program for causing a computer to execute the imageencoding method according to claim 15.